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, Abstract For a Markov transition kernel P and a probability distribution yu 

5^ ' on nonncgative integers, a time-sampled Markov chain evolves according to the 

^ , transition kernel = ^{k)P^ . In this note we obtain CLT conditions for 

time-sampled Markov chains and derive a spectral formula for the asymptotic 
, variance. Using these results we compare efficiency of Barker's and Metropolis 

algorithms in terms of asymptotic variance. 
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1 Introduction 

Let P be an ergodic transition kernel of a Markov chain {Xn)n>o with lim- 
^ \ iting distribution tt on {X,B{X)) and let / : A" — )• M be in L^{tt). A typ- 

■ ical MCMC procedure for estimating I — nf f[x)TT{dx) would use 

in }i Y^^=o fi^i)- Under appropriate assumptions on P and / a CLT holds 
^ ' for /„, i.e. 

(N ■ 

O' V^(/„-/) ^AA(0,4p), (1) 
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where the constant ct^ p < oo is called asymptotic variance and depends only 
on / and P. 

The following theorem from [ 1 5] is a fundamental result on conditions that 
guarantee (1) for reversible Markov chains. 

Theorem 1 ([15]) For a reversible and ergodic Markov chain, and a function 
T^ar(/, P) lim ??,Var^(/„) < oo, (2) 



then (1) holds with 



'f,P 



Var{f,P) = / ^Ef,p{dx), (3) 
"'[-1,1] ^ ^ ^ 



where Ef^p is the spectral measure associated with f and P. 

We refer to (2) as the Kipnis-Varadhan condition. Assuming that (2) holds 
and P is reversible, in Section 2 we obtain conditions for the CLT and derive a 
spectral formula for the asymptotic variance p of a time-sampled Markov 
chain of the form 



p^:=£MWP^ (4) 

k=0 

where /i is a probability distribution on the nonnegative integers. Time-sampled 
Markov chains are of theoretical interest in the context of petite sets (cf. Chap- 
ter 5 of [20]), and also in the context of computational algorithms [27,28]. 

Next we proceed to analyze efficiency of Barker's algorithm [2]. Barker's 
algorithm, similarly as Metropolis, uses an irreducible transition kernel Q to 
draw proposals. A move form Xn = x to a proposal Yn^i = y is then accepted 
with probability 

a^^Hx,y) = ^(y)l(y^^) ^ (5) 

■^{y)qiy,x) +TT{x)q{x,y)' 

where q{x, •) is the transition density of Q{x, •). It is well known that with the 
same proposal kernel Q, the Metropolis acceptance ratio results in a smaller 
asymptotic variance then Barker's. In Section 3 we show that the asymptotic 
variance of Barker's algorithm is not bigger then, roughly speaking, two times 
that of Metropolis. We also motivate our considerations by recent advances in 
exact MCMC for diffusion models. The theoretical results are illustrated by a 
simulation study in Section 4. 
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2 Time-sampled Markov chains 

In this section wc work under assumptions of Theorem 1 which imply that the 
asymptotic variance p equals Var{f, P) defined in (2) and satisfies (3). For 
other Markov chain CLT conditions we refer to [13, 25, 20, 4, 20]. 

Theorem 2 Let P he a reversible and ergodic transition kernel with stationary 
measure tt, and let f € L'^ii:). Assume that the Kipnis-Varadhan condition (2) 
holds for f and P. For a probability distribution ^ on nonnegative integers, 
let the time-sampled kernel he defined by (4) . Then, if any of the following 
conditions hold 

(i) lJ.odd m({1,3,5, ...})> 0, 
(a) /i(0) < 1 and P is geometrically ergodic, 

the CLT holds for f and P^^, moreover 



a 



U = I _ l^^EfAd^) < ex., (6) 



[-1,1] 1 - Gi,{x) 



where is the prohahility generating function of i.e. G^{z) := E^z^, 
l^^l < 1, K ^ fi, and Ef^p is the spectral measure associated with f and P. 

Remark 1 The condition /iodd > in the above result is necessary, which we 
show below by means of a counterexample. 

Proof The proof is based on the functional analytic approach (see e.g. [15, 
24]). Without loss of generality assume that tt/ = 0. A reversible transition 
kernel P with invariant distribution tt is a self-adjoint operator on ig(7r) := 
{/ e L'^{t:) : tt/ = 0} with spectral radius bounded by 1. By the spectral 
decomposition theorem for self adjoint operators, for each / e L'^{tt) there 
exists a finite positive measure i?/,p on [—1, 1], such that 



(/,P'7> = / x"i?/.p(rfx), 

[-1,1] 



for all integers n > 0. Thus in particular 



aj = 7r/2 = / lEf,p{dx) < oo, (7) 



-1,1] 

and by [15] (c.f. also Theorem 4 of [11]) one obtains 



rj.P = I \^EfAdx) < oo. (8) 



Since P^" = J2k l^i^)^''' ^ by the spectral mapping theorem [!)], we have 



(/,^;7> 



/ x-Ef^p^idx) = f {Y^t^Wx^YEfAdx) 
-'[-1,1] -^[-1,1] ^ 

G^(.T)j EfAdx), 
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and consequently, applying the same argument as [15,11], we obtain 



'[-1,1] -L ~ 2: 



1+X 
1-X 

[-1,1] 1 - Gf^i^) 



Ef^p{dx) =: Jlk. (9) 



Now (9) gives the claimed formula but we need to prove (9) is finite: by [15] 
finiteness of the integral in (9) implies a CLT for / and P^. Observe that 

|G(a;)| < 1 for ah a:G[-l,l], 

G{x) < ^i{0) + x{l - n{0)) for a; > 0. 

Moreover, if (i) holds, then 

— fJ-odd for X < 0, 

k even 

hence we can write 

^ ^ 2Ef,p{dx) + -^-7^ / -^Ef,p{dx). (10) 



< 



Modd J[-i,o) ' 1 - Ai(0) 7(0, 1] 1 - 



The first integral in (10) is finite by (7) and the second by (8) and we are done 
with (i). 

Next assume that (ii) holds. By S{P) denote the spectrum of P and let 
sp sup{|A| : A € S{P)} be the spectral radius. From [24] we Itnow that since 
P is reversible and geometrically ergodic, it has a spectral gap, i.e. sp < 1. 
Hence for x G [— sp,0], we can write 

G^<MO)+ '^W^^ ^ MO)+sp(l-/^(0)). 

k even 

Consequently 

J[-sp,Q} 1 - '-'fJ.(^) J[0.sp] 1 - '-'AX) 

If 2 „ _ . If 2 



< 77TT / 1 Ef.pidx) + / Ef,p{dx). (11) 

1-m(0) y[-,p,o) 1 -sp 1 - Ai(0) 7(0,,^] 1 - a; 

The first integral in (11) is finite by (7) and the second by (8). 

The most important special case of Theorem 2 is underlined and computed 
explicitly in the next corollary. 
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Corollary 1 Let P be a reversible and ergodic transition kernel with station- 
ary measure tt, and assume that for f and P the CLT (1) holds. For e £ (0, 1) 
let the lazy version of P be defined as P^ :— eld+ (1 — e)P. Then the CLT 
holds for f and P^ and 

' + (12) 



Proof We use Theorem 2 with yu(0) = e, = 1 — e. Hence = e+{l — e)x, 
and consequently 



I + e + {1 - s)x 

+ e\ Ef^p{dx) 



1 /l + x 



[-14] l-£\l-x 
1 f 1 + x 

J 

1.1] 1-2: i - fc J[_i,i] 



1 -£ 
1 



i-x i-ej[_iii 



2 



Efficiency of time sampled Markov chains can be compared using the fol- 
lowing corollary from Theorem 2. 

Corollary 2 Let P and f be as in Theorem 2. Lf P is positive as an oper- 
ator on L'^^tt) and dominates stochastically ^2 (i-G. /ii >st ^12), then P^^ 
dominates P^i^ in the efficiency ordering, i.e. aj: p < cr^ p . 

Proof If P is positive self-adjoint then suppi?/.p C [0, 1]. Moreover 

Ml >st M2 => Gf_,^{x) < Gf^^{x) for X e [-1, 1]. 

The conclusion follows from (6). 

In another direction of studying CLTs, the variance bounding property 
of Markov chains has been introduced in [26] and is defined as follows. P is 
variance bounding if there exists K < 00 such that Var{f,P) < KVarTr{f) 
for all /. Here Var{f,P) is defined in (2) and Var^(/) = tt/^ - (irf)^. We 
prove that for time-sampled Markov chains the variance bounding property 
propagates the same way the CLT does. 

Theorem 3 Assume P is reversible and variance bounding. Then Pf^ is vari- 
ance bounding if any of the following conditions hold 

(i) l^odd m({1,3,5, ...})> 0, 
(ii) fi{0) < 1 and P is geometrically ergodic. 

Proof For any / such that VarTr/ < 00, the Kipnis-Varadhan condition holds 
due to variance bounding property of P and thus the assumptions of Theorem 2 
are met. Hence for every / G i^(7r) there is a CLT for / and P^. Therefore 
P^ is variance bounding by Theorem 7 of [2(i]. 
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The next example shows that in case of Markov chains that are not geo- 
metrically crgodic, the condition /iodd > is necessary. 

Example 1 We set f{x) = x and give an example of an ergodic and reversible 
transition kernel P on A' = [—1, 1], and such that there is a CLT for P and / 
but not for and /. We shall rely on Theorem 4.1 of [4] that provides if and 
only if conditions for Markov chains CLTs in terms of regenerations. It will be 
apparent that the condition //odd > in Theorem 2 is necessary. 

Set s[x) := ^1 — \x\, let [/(•) be the uniform distribution on [—1, 1], and 
let the kernel P be of the form 

P{x,-) = {l-s{x))5-^{-)+s{x)U{-), hence (13) 

P\x, •) - (1 - s{x))H,{-) + {2s{x) - s{xf)U{-). (14) 

To find the stationary distribution of P (and also P^), we verify reversibility 
with Tr{x) cx l/s{x). 

Tr{dx)P{x,dy) (X ^d-^iy) + S.^iy) + Uidy) 

= -l-^S^y{x) + 6-y{x) + U{dx) (X TT{dy)P{y,dx). 

Hence Tr{x) is a reflected Beta(l, i). Clearly 7r(/^) < oo. 

Recall now the split chain construction [22, 1] of the bivariate Markov chain 
{XmPn} on {0,1} X X = {0,1} X [0,1]. If {Xn)n>o evolvcs according to P 
defined in (14), we have the following transition rule from {Xn-i, Pn-i} to 
{XnjPn} for the split chain. 

P(X„ G -iPn-l = 1,X„-1 = X) = U{-), 
P(X„ e -iPn-l = 0,X„_i = X) = ,5_,(-), 

p(r„ = i\rn-i,Xn^x) = six), 
p(r„ = o|r„_i,x„ = x) = i- s{x). 

The notation P above indicates that we consider the extended probability 
space for (XnjPn), not the original one of Xn- The appropriate modification 
of the above holds if the dynamics of Xn is P^ , namely 

¥{Xn e •|r„_i = i,x„_i = x) = [/(•), 
P(X„ e -iFn-l = o,x„_i = x) = 4(-), 

p(r„ = i|r„_i,x„ = x) = 2s{x) - s''{x), 
p(r„ = o|r„_i,x„ = x) = (1 - s{x))\ 

We refer to to the original papers for more details on the split chain construc- 
tion and to [4,25] for central limit theorems in this context. Denote 



r min{fc > : A = 1}. 



(15) 
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By Theorem 4.1 of [4], the CLT for P and / holds if and only if the following 
expression for the asymptotic variance is finite. 



s{x)TT{x)dx Eu ( f{Xn 

.-141 ^t^o 



(16) 



where {Xn, Pn) follow the dynamics of P. Respectively, the CLT for P^ and / 
holds in our setting, ij and only if 

a}p.= f {2s{x)~s\x))n{x)Axtu{y^f{X,,))' (17) 

is finite, where {X^ Pn) follow the dynamics of P^ . 

Now observe that if (X„)„>o evolves according to P, then (X)I=o •^("''^"))^ 
equals if t is odd, or (X]fc=o f{^n)Y = if is even. Consequently (16) 
is finite. However, if (X„)„>o evolves according to P^ , then (X^I^o /(^n))^ " 
(t 4- l)^Xg and the distribution of t is geometric with parameter 2,s(Xo) — 
s^(Xo) = 1 — (1 — s[x))'^. Therefore we compute ct^ pa in (17) as 



al^. = I [2s{x) - s\x))Ax)dx I lJL^L^M!l^2dx 
'[-14] ^[-14] 2(l-(l-s(x))2) 



-1,1] 2(l-|x|-2yT~R)' 



> C / ; — r-da; = oo. 

- ' _i,i]8(l-|x|) 



3 Barker's algorithm 

When assessing efficiency of IMarkov chain IVIonte Carlo algorithms, the asymp- 
totic variance criterion is one of natural choices. Peskun ordering [2'.{\ (see also 
[29, 21]) provides a tool to compare two reversible transition kernels Pi , P2 with 
the same limiting distribution tt and is defined as follows. Pi >- P2 ■<==J' for 
TT-almost every x € X and aU A € B{X) holds Pi{x, A-{x}) > P2{x, A-{x}). 
If Pi >- P2 then aj p^ < aj p^ for every / g L'^{tt). 

Consider now a class of algorithms where the transition kernel P is defined 
by applying an irreducible proposal kernel Q and an acceptance rule a, i.e. 
given Xn = x, the value of Xn+i is a result of performing the following two 
steps. 

1. Draw a proposal y ^ Q{x,-), 

2. Set Xn+i := y with probability a(a;, y) and Xn+i = x otherwise. 
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where a{x,y) is such that the resulting kernel P is reversible with stationary 
distribution tt. It follows [23, 29] that for a given proposal kernel Q the standard 
Metropolis-Hastings [19,12] acceptance rule 

>{x,y) = mmjl, -| (18) 

yields a transition kernel pC^H) ^-j^g^^- jg niaximal with respect to Peskun or- 
dering and thus minimal with respect to asymptotic variance. In particular, 
the Barker's algorithm [2] that uses acceptance rule 

a(B) y) = AyMy,^) ^^g^ 

■^iy)q{y,x) +TT{x)q{x,y) 

is inferior to Metropolis-Hastings when the asymptotic variance is considered. 
In the above notation we assume that all the involved distributions have com- 
mon denominating measure and q{x, •) are transition densities of Q. See [29] 
for a more general statement and discussion. 

Exact Algorithms introduced in [7,8,5,6] allow for inference in diffusion 
models without Eulcr discretization error. In recent advances in Exact MCMC 
inference for complex diffusion models a particular setting is reoccurring, where 
the Metropolis-Hastings acceptance step requires a specific Bernoulli Factory 
and is not possible to execute. However, in this diffusion context the Barker's 
algorithm (19) is feasible, as well as the 'lazy' version of the Metropolis- 
Hastings kernel 

Pi^H) eld + (1 - e)p(^"). (20) 

We refer to [10,18,16] for the background on exact MCMC inference for dif- 
fusions and the Bernoulli Factory problem. This motivates us to investigate 
performance of these alternatives in comparison to the standard Metropolis- 
Hastings. 

Theorem 4 Let p(^) denote the transition kernel of the Barker's algorithm 
and let PC^"^ and P/^^^ be as defined in (20). If the CLT (1) holds for f 
and p(^^) ^ then it holds also for 

(i) f and P^^^'^ with 



f f p(f-'H) — -ff pfM/fj + -; -CTf. (21) 



(a) f and P(^) with 

y'^f^piMH) ^ f^p(B) ^ " j: p(MH) — ^Uj:^p(HH) 



C f p(MH) ^ j: p(B) 5: ^ f p(MH) — 2^ , p(MH) -\- (J f ■ (22) 
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Proof The first claim (i) is a restatement of Corollary 1 for Metropolis-Hastings 
chains. To obtain the second claim (ii), note that Py^'^ can be viewed as an 
algorithm that uses proposals from Q and acceptance rule 

a[x,y) = mmj-, r}. 

Now since 

. /i T^{y)q{y,x) -, TT{y)q{y,x) . A Tr{y)q{y,x) -, 

mm 1 , r [ > r ^ > mm - , r- \ , 

7r(a:;)q(a;,?/) 7r(y)g(y, x) + 7r(a;)(7(a;, y) 2 2TT{x)q{x,y) ' 

the result follows from Peskun ordering and Corollary 1. 



4 Numerical Examples 

To illustrate the theoretical findings, we consider two numerical examples. 
The first focuses on time sampling, the second on efficiency of the Barker's 
algorithm. 



4.1 Time sampled contracting normals 

Consider the contracting normals example, i.e. a Markov chain with transition 
probabilities 

P{x,-) = N{9x,l-e'^) (23) 

for some 9 E (—1,1). It is easy to check that the stationary distribution is 
7r(-) = iV(0, 1). Moreover the transition kernel is geometrically ergodic and 
reversible for all 9 G (—1,1) and also positive for 9 £ [0,1), [3,17]. For the 
target function we take f{x) = x and estimate the asymptotic variance using 
the batch means estimator of [14] based on a trajectories of length 10^. We 
set 9 to 0.9 and —0.9 in the following settings: 

— CN: Contracting normals; 

— LCN: Lazy contracting normals with e — 0.5; 

— TSCNl: Time sampled contracting normals for sampling distribution 

/i = 1 + Pois{l); 

— TSCN2: Time sampled contracting normals for sampling distribution 



^ = 1 + Pois{5). 





CN 


LCN 


TSCNl 


TSCN2 


9 = 0.9 


19.1 


38.5 


9.28 


3.43 


9 = -0.9 


0.053 


1.14 


0.80 


0.96 



Table 1 . Estimated asymptotic variance of the contracting normals Markov chain for 
different sampling scenarios. 
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The first two columns of Table 1 report how laziness increases asymptotic 
variance and illustrate Corollary 1. Note that the stationary variance aj = 1 
is substantial compared to the asymptotic variance of contracting normals for 
9 = —0.9 and thus the lazy version LCN becomes severely inefficient compared 
to CN. The stochastic ordering of the sampling distributions in the above 
scenarios is LCN <st CN <st TSCNl K^t TSCN2 therefore the simulation 
shows how the asymptotic variance decreases for stochastically bigger sampling 
distributions (Corollary 2) in case of positive operators {0 = 0.9) and how this 
property fails if the operator is not positive, i.e for 9 = —0.9. 

4.2 Efhciency of the Barker's algorithm 

We compare the estimated asymptotic variance of the random walk Metropo- 
lis algorithm, the Barker's algorithm and lazy version of the random walk 
Metropolis with e = 0.5 to illustrate the bounds of Theorem 4. For the sta- 
tionary distribution we take N{Q, 1) and the increment proposal is f7([— 2, 2]). 
The results based on a simulation length lO'' are reported in Table 2. 





Metropolis 


Barker's 


lazy Metropolis 


asymptotic variance 


3.69 


5.67 


8.32 



Table 1. Estimated asymptotic variance of the Metropolis, Barker's and lazy Metropolis 

algorithms. 
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